Antipattern: Sort Data Randomly

Let's see the techniques used to sort the data that causes an antipattern.

We'll cover the following

Table scan

The most common SQL trick to pick a random row from a query is to sort the query randomly and pick the first row. This technique is easy to understand and easy to implement:

Enter to Rename, Shift+Enter to Preview

Picking up a random row from the Bugs table using RAND()

Although this is a popular solution, it quickly shows its weakness. To understand this weakness, let’s first compare it to conventional sorting, in which we compare values in a column and order the rows according to the size of the value in that column. This kind of sorting is repeatable in that it produces the same results when we run it more than once. It also benefits from an index because an index is essentially a presorted set of the values from a given column.

Enter to Rename, Shift+Enter to Preview

Retrieving data from the Bugs table

Using a function that returns a random value per row makes it random whether a given row is greater or less than another row. So, the order has no relation to the values in each row. The order is also different each time we sort in this way. So far, so good — this is the result that we want.

Sorting by a non-deterministic expression (RAND()) means that the sorting cannot benefit from an index. There is no index containing the values returned by the random function. That’s the point of their being random: they are different and unpredictable each time they’re selected.

Table scan#

The sort-by-random technique is a problem for the performance of the query because using an index is one of the best ways to speed up sorting. The consequence of not using an index is that the query result set must be sorted by the database “manually.” This is called a table scan, and it often involves saving the entire result as a temporary table and sorting it by physically swapping rows. A table scan sort is much slower than an index-assisted sort, and the performance difference grows with the size of the data set.

Another weakness of the sort-by-random technique is that after the expensive process of sorting the entire data set, most of that work is wasted because all but the first row is immediately discarded. In a table with a thousand rows, why go to the trouble of randomizing all thousand when all we need is one row?

Both of these problems are unnoticeable when we run the query over a small number of rows, so during development and testing, it may appear to be a good solution. But as the volume in our database increases over time, the query fails to scale well.

Synopsis: Random Selection

Solution: In No Particular Order

Loginto save progress

Report an Issue

Introduction

Logical Antipattern - Jaywalking

Logical Antipattern - Naive Trees

Logical Antipattern - ID Required

Logical Antipattern - Keyless Entry

Assessment: Logical Antipatterns - Part 1

Logical Antipattern - Entity-Attribute-Value

Logical Antipattern - Polymorphic Associations

Logical Antipattern - Multicolumn Attributes

Logical Antipattern - Metadata Tribbles

Assessment: Logical Antipatterns - Part 2

Physical Antipattern - Rounding Errors

Physical Antipattern - 31 Flavors

Physical Antipattern - Phantom Files

Physical Antipattern - Index Shotgun

Assessment: Physical Antipatterns

Query Antipattern - Fear of the Unknown

Query Antipattern - Ambiguous Groups

Query Antipattern - Random Selection

Query Antipattern - Poor Man’s Search Engine

Query Antipattern - Spaghetti Query

Query Antipattern - Implicit Columns

Assessment: Query Antipattern

Application Development Antipattern - Readable Passwords

Application Development Antipattern - SQL Injection

Application Development Antipattern - Pseudokey Neat-Freak

Application Development Antipattern - See No Evil

Application Development Antipattern - Diplomatic Immunity

Application Development Antipattern - Magic Beans

Assessment: Application Development Antipattern

Conclusion

Antipattern: Sort Data Randomly

Table scan#